Time Variable Reinforcement Learning and Reinforcement Function Design

نویسندگان

Andreas Daniel Matt

Ulrich Oberst

Juan Miguel Santos

چکیده

We introduce the mathematical model for time variable reinforcement learning. The policy, the rewards or reinforcement function and the transition probabilities may depend on the progress of the time t. We prove that under certain conditions slightly changed methods of classical dynamic programming assure finding the optimal policy. For that we deduct the Bellman equation for the time variable case and apply the fixed point theorem. Furthermore we present a particular flexible reinforcement function design frame with adjustable parameters and its theorical equivalence to general reinforcement functions. A parameter update algorithm (UPA) for the new reinforcement function in order to guarantee desired ratios of positive, negative and null rewards is introduced. In a series of real robot experiments we show that using the time variable reinforcement function introduced above may help to accelerate learning. Interesting results comparing the learning progress for wall following and an obstacle avoidance behavior implementing Q-learning and a radial basis function network are given. As a main result of our work we address the effectiveness of reinforcement function design and time variable reinforcement learning in general. CONTENTS About this work.......................................................................................................i Acknowledgement...................................................................................................i Abstract...................................................................................................................ii

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Low-Area/Low-Power CMOS Op-Amps Design Based on Total Optimality Index Using Reinforcement Learning Approach

This paper presents the application of reinforcement learning in automatic analog IC design. In this work, the Multi-Objective approach by Learning Automata is evaluated for accommodating required functionalities and performance specifications considering optimal minimizing of MOSFETs area and power consumption for two famous CMOS op-amps. The results show the ability of the proposed method to ...

متن کامل

RRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features

Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...

متن کامل

Dynamic Obstacle Avoidance by Distributed Algorithm based on Reinforcement Learning (RESEARCH NOTE)

In this paper we focus on the application of reinforcement learning to obstacle avoidance in dynamic Environments in wireless sensor networks. A distributed algorithm based on reinforcement learning is developed for sensor networks to guide mobile robot through the dynamic obstacles. The sensor network models the danger of the area under coverage as obstacles, and has the property of adoption o...

متن کامل

Meta Reinforcement Learning with Latent Variable Gaussian Processes

Data efficiency, i.e., learning from small data sets, is critical in many practical applications where data collection is time consuming or expensive, e.g., robotics, animal experiments or drug design. Meta learning is one way to increase the data efficiency of learning algorithms by generalizing learned concepts from a set of training tasks to unseen, but related, tasks. Often, this relationsh...

متن کامل

Multiple Model-Based Reinforcement Learning

We propose a modular reinforcement learning architecture for nonlinear, nonstationary control tasks, which we call multiple model-based reinforcement learning (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environmental dynamics. The system is composed of multiple modules, each of which consists of a state predict...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Time Variable Reinforcement Learning and Reinforcement Function Design

نویسندگان

چکیده

منابع مشابه

Low-Area/Low-Power CMOS Op-Amps Design Based on Total Optimality Index Using Reinforcement Learning Approach

RRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features

Dynamic Obstacle Avoidance by Distributed Algorithm based on Reinforcement Learning (RESEARCH NOTE)

Meta Reinforcement Learning with Latent Variable Gaussian Processes

Multiple Model-Based Reinforcement Learning

عنوان ژورنال:

اشتراک گذاری